Sains Malaysiana 53(4)(2024): 907-920

http://doi.org/10.17576/jsm-2024-5304-14

 

A Remedial Measure of Multicollinearity in Multiple Linear Regression in the Presence of High Leverage Points

(Pemulihan Ukuran Multikolinearan dalam Model Regresi Linear Berganda dengan Kehadiran Titik Terpencil)

 

SHELAN SAIED ISMAEEL1, HABSHAH MIDI2,* & KURDISTAN M. TAHER OMAR1 

 

1Department of Mathematics, Faculty of Science, University of Zakho, Iraq

 2Faculty of Science and Institute for Mathematical Research, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia

 

Diserahkan: 14 Mac 2023/Diterima: 5 Mac 2024

 

Abstract

The ordinary least squares (OLS) is the widely used method in multiple linear regression model due to tradition and its optimal properties. Nonetheless, in the presence of multicollinearity, the OLS method is inefficient because the standard errors of its estimates become inflated.  Many methods have been proposed to remedy this problem that include the Jackknife Ridge Regression (JAK). However, the performance of JAK is poor when multicollinearity and high leverage points (HLPs) which are outlying observations in the X- direction are present in the data. As a solution to this problem, Robust Jackknife Ridge MM (RJMM) and Robust Jackknife Ridge GM2 (RJGM2) estimators are put forward. Nevertheless, they are still not very efficient because they suffer from long computational running time, some elements of biased and do not have bounded influence property. This paper proposes a robust Jackknife ridge regression that integrates a generalized M estimator and fast improvised Gt (GM-FIMGT) estimator, in its establishment. We name this method the robust Jackknife ridge regression based on GM-FIMGT, denoted as RJFIMGT. The numerical results show that the proposed RJFIMGT method was found to be the best method as it has the least values of RMSE and bias compared to other methods in this study.

 

Keywords: High leverage points; jackknife; MM-estimator; multicollinearity; ridge regression

 

Abstrak

Kaedah kuasadua terkecil sering digunakan dalam model linear regresi berganda kerana tradisi dan sifatnya yang optimal. Walau bagaimanapun, dalam kehadiran multikolinearan, kaedah OLS tidak cekap disebabkan penganggar ralat piawai menjadi besar. Banyak kaedah telah dicadangkan bagi mengatasi masalah ini termasuk kaedah Jackknife Ridge Regression (JAK). Namun, prestasi kaedah JAK sangat lemah dengan kehadiran multikolinearan dan titik tuasan tinggi iaitu cerapan terpencil dalam arah X . Sebagai penyelesaian bagi masalah ini, penganggar Robust Jackknife Ridge MM (RJMM) dan penganggar Jackknife Ridge GM2 (RJGM2) di ketengahkan. Walau bagaimanapun, kaedah ini masih tidak cukup cekap kerana mereka mengambil masa pengiraan yang panjang, mempunyai unsur kepincangan dan tidak mempunyai sifat pengaruh terbatas.  Kertas ini mencadangkan kaedah robust Jackknife ridge regression yang menggabungkan penganggar- M teritlak (GM) dan penganggar pantas terubah suai GT (GM- FIMGT) dalam membangunkannya. Kaedah ini dinamakan robust Jackknife ridge regression berdasarkan GM-FIMGT, ditandakan dengan RJFIMGT. Keputusan berangka menunjukkan bahawa kaedah RJFIMGT yang dicadangkan adalah yang terbaik kerana ia mempunyai nilai RMSE dan pincang terkecil berbanding dengan kaedah lain dalam kajian ini.

 

Kata kunci: Jackknife; multikolinearan; penganggar MM; regresi ridge; titik tuasan tinggi

 

RUJUKAN

Alguraibawi, M., Midi, H. & Rana, S. 2015. Robust jackknife ridge regression to combat multicollinearity and high leverage points. Economics Computation and Economic Cybernatics Studies and Research 49(4): 305-322.

Akdeniz Duran, E. & Akdeniz, F. 2012. Efficiency of the modified jackknifed Liu-type estimator. Statistical Papers 53(2): 265-280.

Arskin, R.G. & Montgomery, D.C. 1980. Augmented robust estimators. Technometrics 22: 333-341.

Bagheri, A. & Midi, H. 2015. Diagnostic plot for the identification of high leverage collinearity-influential observations. Statistics and Operation Research Transaction Journal 39: 51-70. 

Batah, F.S., Ramanathan, T.V. & Gore, S.D. 2008. The efficiency of modified jackknife and ridge type regression estimators: A comparison. Surv. Math. Appl. 3: 111-122.

Belsley, D.A., Kuh, E. & Welsch, R.E. 2004. Regression Diagnostics, Identifying Influential Data and Sources of Collinearity. New York: John Wiley & Sons Inc.

Brown, P.J. 1977. Centering and scaling in ridge regression. Technometrics 19(1): 35-36.

Dhhan, W., Rana, S. & Midi, H. 2016. A high breakdown, high efficiency and bounded influence modified GM estimator based on support vector regression. Journal of Applied Statistics 44(4): 700-714. https://doi.org/10.1080/02664763.2016.1182133

Groβ, J. 2003. Linear Regression (Lecture Notes in Statistics). Verlag Berlin Heidelberg: Springer.

Hinkley, D.V. 1977. Jackknifing in unbalanced situations. Technometrics 19(3): 285-292.

Hoerl, A.E. & Kennard, R.W. 1970. Ridge regression: Biased estimation for non-orthogonal problems. Technometrics 12(1): 55-67.

Huber, P.J. 2004. Robust Statistics. New York: John Wiley & Sons.

Jadhav, N.H. & Kashid, D.N. 2011. A jackknifed ridge M-estimator for regression model with multicollinearity and outliers. Journal of Statistical Theory and Practice 5: 659-673.

Kutner, M.H., Nachtsheim, C.J., Neter, J. & Li, W. 2005. Applied Linear Regression Models. 5th ed. New York: McGraw-Hill.

Lawrence, K. & Arthur, J. 1990. Robust Regression Analysis and Applications. New York: Marcel Dekker Inc. pp. 59-86.

Lim, H.A. & Midi, H. 2016. Diagnostic robust generalized potential based on index set equality (DRGP (ISE)) for the identification of high leverage points in linear model. Computational Statistics 31(3): 859-877.

Li, G. & Chen, Z. 1985. Projection-pursuit approach to robust dispersion matrices and principal components: Primary theory and Monte Carlo. Journal of the American Statistical Association 80: 759-766.

Maronna, R.A., Martin, R.D. & Yohai, V.J. 2006. Robust Statistics Theory and Methods. New York: Wiley.

Midi, H. & Zahari, M. 2007. A simulation study on ridge regression estimators in the presence of outliers and multicollinearity. Jurnal Teknologi 47(C): 59-74.

Midi, H., Hendi, T.H., Arasan, J. & Uraibi, H. 2020. Fast and robust diagnostic technique for the detection of high leverage points. Pertanika J. Sci. & Tech. 28(4): 1203-1220.

Midi, H., Ismaeel, S.S., Arasan, J. & Mohammad, A.M. 2021. Simple and fast generalized - M (GM) estimator and its application to real data. Sains Malaysiana 50(3): 859-867.

Montgomery, D.C., Peck, E.A. & Viving, G.G. 2001. Introduction to Linear Regression Analysis. 3rd ed. New York: John Wiley and Sons.

Penrose, K.W., Nelson, A. & Fisher, A. 1985. Generalized body composition prediction equation for men using simple measurement techniques. Medicine & Science in Sports & Exercise 17(2): 189.

Pison, G., Rousseeuw, P.J., Filzmoser, P. & Croux, C. 2003. Robust factor analysis. Journal of Multivariate Analysis 84(1): 145-172.

Quenouille, M.H. 1956. Notes on bias in estimation. Biometrika 43(3-4): 353-360.

Rashid, A.M., Midi, H., Dhnn, W. & Arasan, J. 2021. An efficient estimation and classification methods for high dimensional data using robust iteratively reweighted SIMPLS algorithm based on nu-support vector regression. IEEE Access 9: 45955-45967.

Rousseeuw, P.J. 1984. Least median of squares regression. Journal of the American Statistical Association 79: 871-880.

Rousseeuw, P.J. & Van Driessen, K. 1999. A fast algorithm for the minimum covariance determinant estimator. Technometrics 41: 212-223.

Shah, I., Sajid, F., Ali, S., Rehman, A., Bahaj, A.A. & Fati, S.M. 2021.  On the performance of jackknife based on estimators for ridge regression. IEEE Access 9: 68044-68053.

Simpson, D.G., Ruppert, D. & Carroll, R.J. 1992. On one-step GM estimates and stability of influences in linear regression.  Journal of the American Statistical Association 87: 439-450.

Singh, B., Chaubey, Y.P. & Dwivedi, T.D. 1986. An almost unbiased ridge estimator. The Indian Journal of Statistics 48: 342-346.

Zahariah, S. & Midi, H. 2023. Minimum regularized covariance determinant and principal component analysis – based method for the identification of high leverage points in high dimensional sparse data. Journal of Applied Statistics 50(13): 2817-2835.

Zahariah, S., Midi, H. & Mustafa, M.S. 2021. An improvised SIMPLS estimator based on MRCD-PCA weighting function and its application to real data. Symmetry 13(11): 2211.

 

*Pengarang untuk surat-menyurat; email: habshah@upm.edu.my

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

   

sebelumnya